Goto

Collaborating Authors

 Karnes County


HGC-Herd: Efficient Heterogeneous Graph Condensation via Representative Node Herding

Ou, Fuyan, Ai, Siqi, Hu, Yulin

arXiv.org Artificial Intelligence

Heterogeneous graph neural networks (HGNNs) have demonstrated strong capability in modeling complex semantics across multi-type nodes and relations. However, their scalability to large-scale graphs remains challenging due to structural redundancy and high-dimensional node features. Existing graph condensation approaches, such as GCond, are primarily developed for homogeneous graphs and rely on gradient matching, resulting in considerable computational, memory, and optimization overhead. We propose HGC-Herd, a training-free condensation framework that generates compact yet informative heterogeneous graphs while maintaining both semantic and structural fidelity. HGC-Herd integrates lightweight feature propagation to encode multi-hop relational context and employs a class-wise herding mechanism to identify representative nodes per class, producing balanced and discriminative subsets for downstream learning tasks. Extensive experiments on ACM, DBLP, and Freebase validate that HGC-Herd attains comparable or superior accuracy to full-graph training while markedly reducing both runtime and memory consumption. These results underscore its practical value for efficient and scalable heterogeneous graph representation learning.



Is 'Hope' a person or an idea? A pilot benchmark for NER: comparing traditional NLP tools and large language models on ambiguous entities

Latifi, Payam

arXiv.org Artificial Intelligence

This pilot study presents a small-scale but carefully annotated benchmark of Named Entity Recognition (NER) performance across six systems: three non-LLM NLP tools (NLTK, spaCy, Stanza) and three general-purpose large language models (LLMs: Gemini-1.5-flash, DeepSeek-V3, Qwen-3-4B). The dataset contains 119 tokens covering five entity types (PERSON, LOCATION, ORGANIZATION, DATE, TIME). We evaluated each system's output against the manually annotated gold standard dataset using F1-score. The results show that LLMs generally outperform conventional tools in recognizing context-sensitive entities like person names, with Gemini achieving the highest average F1-score. However, traditional systems like Stanza demonstrate greater consistency in structured tags such as LOCATION and DATE. We also observed variability among LLMs, particularly in handling temporal expressions and multi-word organizations. Our findings highlight that while LLMs offer improved contextual understanding, traditional tools remain competitive in specific tasks, informing model selection.


Investigating the Impact of Observation Space Design Choices On Training Reinforcement Learning Solutions for Spacecraft Problems

Hamilton, Nathaniel, Dunlap, Kyle, Hobbs, Kerianne L

arXiv.org Artificial Intelligence

AAS 25-147 INVESTIGATING THE IMP ACT OF OBSERVATION SP ACE DESIGN CHOICES ON TRAINING REINFORCEMENT LEARNING SOLUTIONS FOR SP ACECRAFT PROBLEMS Nathaniel Hamilton *, Kyle Dunlap, and Kerianne L. Hobbs Recent research using Reinforcement Learning (RL) to learn autonomous control for spacecraft operations has shown great success. However, a recent study showed their performance could be improved by changing the action space, i.e. control outputs, used in the learning environment. This has opened the door for finding more improvements through further changes to the environment. The work in this paper focuses on how changes to the environment's observation space can impact the training and performance of RL agents learning the spacecraft inspection task. The studies are split into two groups. The first looks at the impact of sensors that were designed to help agents learn the task. The second looks at the impact of reference frames, reorienting the agent to see the world from a different perspective. The results show the sensors are not necessary, but most of them help agents learn more optimal behavior, and that the reference frame does not have a large impact, but is best kept consistent. INTRODUCTION Autonomous spacecraft operation is a critical capability for managing the growing number of space and increasingly complex operations.


Triangle Generative Adversarial Networks

Zhe Gan, Liqun Chen, Weiyao Wang, Yuchen Pu, Yizhe Zhang, Hao Liu, Chunyuan Li, Lawrence Carin

Neural Information Processing Systems

A Triangle Generative Adversarial Network ( -GAN) is developed for semisupervised cross-domain joint distribution matching, where the training data consists of samples from each domain, and supervision of domain correspondence is provided by only a few paired samples.


A study on the effects of mixed explicit and implicit communications in human-virtual-agent interactions

Campos, Ana Christina Almada, Adorno, Bruno Vilhena

arXiv.org Artificial Intelligence

Communication between humans and robots (or virtual agents) is essential for interaction and often inspired by human communication, which uses gestures, facial expressions, gaze direction, and other explicit and implicit means. This work presents an interaction experiment where humans and virtual agents interact through explicit (gestures, manual entries using mouse and keyboard, voice, sound, and information on screen) and implicit (gaze direction, location, facial expressions, and raise of eyebrows) communication to evaluate the effect of mixed explicit-implicit communication against purely explicit communication. Results obtained using Bayesian parameter estimation show that the number of errors and task execution time did not significantly change when mixed explicit and implicit communications were used, and neither the perceived efficiency of the interaction. In contrast, acceptance, sociability, and transparency of the virtual agent increased when using mixed communication modalities (88.3%, 92%, and 92.9% of the effect size posterior distribution of each variable, respectively, were above the upper limit of the region of practical equivalence). This suggests that task-related measures, such as time, number of errors, and perceived efficiency of the interaction, have not been influenced by the communication type in our particular experiment. However, the improvement of subjective measures related to the virtual agent, such as acceptance, sociability, and transparency, suggests that humans are more receptive to mixed explicit and implicit communications.


What is the best RNN-cell structure to forecast each time series behavior?

Khaldi, Rohaifa, Afia, Abdellatif El, Chiheb, Raddouane, Tabik, Siham

arXiv.org Artificial Intelligence

It is unquestionable that time series forecasting is of paramount importance in many fields. The most used machine learning models to address time series forecasting tasks are Recurrent Neural Networks (RNNs). Typically, those models are built using one of the three most popular cells: ELMAN, Long Short-Term Memory (LSTM), or Gated Recurrent Unit (GRU) cells. Each cell has a different structure and implies a different computational cost. However, it is not clear why and when to use each RNN-cell structure. Actually, there is no comprehensive characterization of all the possible time series behaviors and no guidance on what RNN cell structure is the most suitable for each behavior. The objective of this study is twofold: it presents a comprehensive taxonomy of almost all time series behaviors and provides insights into the best RNN cell structure for each time series behavior. We conducted two experiments: (1) We evaluate and analyze the role of each component in the LSTM-Vanilla cell by creating 11 variants based on one alteration in its basic architecture (removing, adding, or substituting one cell component). (2) We evaluate and analyze the performance of 20 possible RNN-cell structures. To evaluate, compare, and select the best model, different statistical metrics were used: error-based metrics, information criterion-based metrics, naive-based metrics, and direction change-based metrics. To further improve our confidence in the models interpretation and selection, the Friedman Wilcoxon-Holm signed-rank test was used. Our results advocate the usage and exploration of the newly created RNN variant, named SLIM, in time series forecasting thanks to its high ability to accurately predict the different time series behaviors, as well as its simple structural design that does not require expensive temporal and computing resources.


MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues

Bai, Ge, Liu, Jie, Bu, Xingyuan, He, Yancheng, Liu, Jiaheng, Zhou, Zhanhui, Lin, Zhuoran, Su, Wenbo, Ge, Tiezheng, Zheng, Bo, Ouyang, Wanli

arXiv.org Artificial Intelligence

The advent of Large Language Models (LLMs) has drastically enhanced dialogue systems. However, comprehensively evaluating the dialogue abilities of LLMs remains a challenge. Previous benchmarks have primarily focused on single-turn dialogues or provided coarse-grained and incomplete assessments of multi-turn dialogues, overlooking the complexity and fine-grained nuances of real-life dialogues. To address this issue, we introduce MT-Bench-101, specifically designed to evaluate the fine-grained abilities of LLMs in multi-turn dialogues. By conducting a detailed analysis of real multi-turn dialogue data, we construct a three-tier hierarchical ability taxonomy comprising 4208 turns across 1388 multi-turn dialogues in 13 distinct tasks. We then evaluate 21 popular LLMs based on MT-Bench-101, conducting comprehensive analyses from both ability and task perspectives and observing differing trends in LLMs performance across dialogue turns within various tasks. Further analysis indicates that neither utilizing common alignment techniques nor chat-specific designs has led to obvious enhancements in the multi-turn abilities of LLMs. Extensive case studies suggest that our designed tasks accurately assess the corresponding multi-turn abilities. The data and code are available at \url{https://github.com/mtbench101/mt-bench-101}.


Beyond Boundaries: Learning a Universal Entity Taxonomy across Datasets and Languages for Open Named Entity Recognition

Yang, Yuming, Zhao, Wantong, Huang, Caishuang, Ye, Junjie, Wang, Xiao, Zheng, Huiyuan, Nan, Yang, Wang, Yuran, Xu, Xueying, Huang, Kaixin, Zhang, Yunke, Gui, Tao, Zhang, Qi, Huang, Xuanjing

arXiv.org Artificial Intelligence

Open Named Entity Recognition (NER), which involves identifying arbitrary types of entities from arbitrary domains, remains challenging for Large Language Models (LLMs). Recent studies suggest that fine-tuning LLMs on extensive NER data can boost their performance. However, training directly on existing datasets faces issues due to inconsistent entity definitions and redundant data, limiting LLMs to dataset-specific learning and hindering out-of-domain generalization. To address this, we present B2NERD, a cohesive and efficient dataset for Open NER, normalized from 54 existing English or Chinese datasets using a two-step approach. First, we detect inconsistent entity definitions across datasets and clarify them by distinguishable label names to construct a universal taxonomy of 400+ entity types. Second, we address redundancy using a data pruning strategy that selects fewer samples with greater category and semantic diversity. Comprehensive evaluation shows that B2NERD significantly improves LLMs' generalization on Open NER. Our B2NER models, trained on B2NERD, outperform GPT-4 by 6.8-12.0 F1 points and surpass previous methods in 3 out-of-domain benchmarks across 15 datasets and 6 languages.


QFMTS: Generating Query-Focused Summaries over Multi-Table Inputs

Zhang, Weijia, Pal, Vaishali, Huang, Jia-Hong, Kanoulas, Evangelos, de Rijke, Maarten

arXiv.org Artificial Intelligence

Table summarization is a crucial task aimed at condensing information from tabular data into concise and comprehensible textual summaries. However, existing approaches often fall short of adequately meeting users' information and quality requirements and tend to overlook the complexities of real-world queries. In this paper, we propose a novel method to address these limitations by introducing query-focused multi-table summarization. Our approach, which comprises a table serialization module, a summarization controller, and a large language model (LLM), utilizes textual queries and multiple tables to generate query-dependent table summaries tailored to users' information needs. To facilitate research in this area, we present a comprehensive dataset specifically tailored for this task, consisting of 4909 query-summary pairs, each associated with multiple tables. Through extensive experiments using our curated dataset, we demonstrate the effectiveness of our proposed method compared to baseline approaches. Our findings offer insights into the challenges of complex table reasoning for precise summarization, contributing to the advancement of research in query-focused multi-table summarization.